Search CORE

198 research outputs found

Correlated learning for aggregation systems

Author: Pradeep VARAKANTHAM
VERMA Tanvi
Publication venue: 'Universidade Regional Integrada do Alto Uruguai e das Missoes'
Publication date: 01/07/2019
Field of study

Institutional Knowledge at Singapore Management University

Learning Individual Policies in Large Multi-agent Systems through Local Variance Minimization

Author: Varakantham Pradeep
Verma Tanvi
Publication venue
Publication date: 27/12/2022
Field of study

In multi-agent systems with large number of agents, typically the contribution of each agent to the value of other agents is minimal (e.g., aggregation systems such as Uber, Deliveroo). In this paper, we consider such multi-agent systems where each agent is self-interested and takes a sequence of decisions and represent them as a Stochastic Non-atomic Congestion Game (SNCG). We derive key properties for equilibrium solutions in SNCG model with non-atomic and also nearly non-atomic agents. With those key equilibrium properties, we provide a novel Multi-Agent Reinforcement Learning (MARL) mechanism that minimizes variance across values of agents in the same state. To demonstrate the utility of this new mechanism, we provide detailed results on a real-world taxi dataset and also a generic simulator for aggregation systems. We show that our approach reduces the variance in revenues earned by taxi drivers, while still providing higher joint revenues than leading approaches.Comment: arXiv admin note: substantial text overlap with arXiv:2003.0708

arXiv.org e-Print Archive

Transferable Curricula through Difficulty Conditioned Generators

Author: Tio Sidney
Varakantham Pradeep
Publication venue
Publication date: 22/06/2023
Field of study

Advancements in reinforcement learning (RL) have demonstrated superhuman performance in complex tasks such as Starcraft, Go, Chess etc. However, knowledge transfer from Artificial "Experts" to humans remain a significant challenge. A promising avenue for such transfer would be the use of curricula. Recent methods in curricula generation focuses on training RL agents efficiently, yet such methods rely on surrogate measures to track student progress, and are not suited for training robots in the real world (or more ambitiously humans). In this paper, we introduce a method named Parameterized Environment Response Model (PERM) that shows promising results in training RL agents in parameterized environments. Inspired by Item Response Theory, PERM seeks to model difficulty of environments and ability of RL agents directly. Given that RL agents and humans are trained more efficiently under the "zone of proximal development", our method generates a curriculum by matching the difficulty of an environment to the current ability of the student. In addition, PERM can be trained offline and does not employ non-stationary measures of student ability, making it suitable for transfer between students. We demonstrate PERM's ability to represent the environment parameter space, and training with RL agents with PERM produces a strong performance in deterministic environments. Lastly, we show that our method is transferable between students, without any sacrifice in training quality.Comment: IJCAI'2

arXiv.org e-Print Archive

Leveraging artificial intelligence to capture the Singapore rideshare market

Author: VARAKANTHAM Pradeep
Publication venue
Publication date: 01/12/2018
Field of study

Institutional Knowledge at Singapore Management University

Sequential decision making for improving efficiency in urban environments

Author: Pradeep VARAKANTHAM
Publication venue: AAAI Press
Publication date: 01/07/2016
Field of study

Institutional Knowledge at Singapore Management University

ZAC: A Zone pAth Construction approach for effective real-time ridesharing

Author: JAILLET Patrick
LOWALEKAR Meghna
VARAKANTHAM Pradeep
Publication venue: AAAI Press
Publication date: 01/07/2019
Field of study

National Research Foundation (NRF) Singapore under SMART and Future Mobilit

DSpace@MIT

Institutional Knowledge at Singapore Management University

Association for the Advancement of Artificial Intelligence: AAAI Publications

Diversity Induced Environment Design via Self-Play

Author: Li Dexun
Li Wenjun
Varakantham Pradeep
Publication venue
Publication date: 25/07/2023
Field of study

Recent work on designing an appropriate distribution of environments has shown promise for training effective generally capable agents. Its success is partly because of a form of adaptive curriculum learning that generates environment instances (or levels) at the frontier of the agent's capabilities. However, such an environment design framework often struggles to find effective levels in challenging design spaces and requires costly interactions with the environment. In this paper, we aim to introduce diversity in the Unsupervised Environment Design (UED) framework. Specifically, we propose a task-agnostic method to identify observed/hidden states that are representative of a given level. The outcome of this method is then utilized to characterize the diversity between two levels, which as we show can be crucial to effective performance. In addition, to improve sampling efficiency, we incorporate the self-play technique that allows the environment generator to automatically generate environments that are of great benefit to the training agent. Quantitatively, our approach, Diversity-induced Environment Design via Self-Play (DivSP), shows compelling performance over existing methods

arXiv.org e-Print Archive

Neural Approximate Dynamic Programming for On-Demand Ride-Pooling

Author: Lowalekar Meghna
Shah Sanket
Varakantham Pradeep
Publication venue
Publication date: 20/11/2019
Field of study

On-demand ride-pooling (e.g., UberPool) has recently become popular because of its ability to lower costs for passengers while simultaneously increasing revenue for drivers and aggregation companies. Unlike in Taxi on Demand (ToD) services -- where a vehicle is only assigned one passenger at a time -- in on-demand ride-pooling, each (possibly partially filled) vehicle can be assigned a group of passenger requests with multiple different origin and destination pairs. To ensure near real-time response, existing solutions to the real-time ride-pooling problem are myopic in that they optimise the objective (e.g., maximise the number of passengers served) for the current time step without considering its effect on future assignments. This is because even a myopic assignment in ride-pooling involves considering what combinations of passenger requests that can be assigned to vehicles, which adds a layer of combinatorial complexity to the ToD problem. A popular approach that addresses the limitations of myopic assignments in ToD problems is Approximate Dynamic Programming (ADP). Existing ADP methods for ToD can only handle Linear Program (LP) based assignments, however, while the assignment problem in ride-pooling requires an Integer Linear Program (ILP) with bad LP relaxations. To this end, our key technical contribution is in providing a general ADP method that can learn from ILP-based assignments. Additionally, we handle the extra combinatorial complexity from combinations of passenger requests by using a Neural Network based approximate value function and show a connection to Deep Reinforcement Learning that allows us to learn this value-function with increased stability and sample-efficiency. We show that our approach outperforms past approaches on a real-world dataset by up to 16%, a significant improvement in city-scale transportation problems.Comment: Accepted for publication to the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

Association for the Advancement of Artificial Intelligence: AAAI Publications